throughput-optimal topology design
Throughput-Optimal Topology Design for Cross-Silo Federated Learning
Federated learning usually employs a client-server architecture where an orchestrator iteratively aggregates model updates from remote clients and pushes them back a refined model. This approach may be inefficient in cross-silo settings, as close-by data silos with high-speed access links may exchange information faster than with the orchestrator, and the orchestrator may become a communication bottleneck. In this paper we define the problem of topology design for cross-silo federated learning using the theory of max-plus linear systems to compute the system throughput---number of communication rounds per time unit. We also propose practical algorithms that, under the knowledge of measurable network characteristics, find a topology with the largest throughput or with provable throughput guarantees.
Review for NeurIPS paper: Throughput-Optimal Topology Design for Cross-Silo Federated Learning
In general the paper reads well, there are only minor details I have regarding clarity of presentation. I do appreciate being upfront about what kind of scenarios this work applies to and what not. Sec 1 I think very well grounds the rest of the work in prior research. In particular, in contrast to some related recent works, I feel this thorough grounding helps the authors to ask better questions, and in this sense I feel this work could help inspire further research. L65: I am not sure how compression plays into the preference for sychronous algorithms. It is relevant though, including for the precise topic studied here. Perhaps missing reference is Caldas et al., "Expanding the Reach of Federated Learning by Reducing Client Resource Requirements" which does both model and update compression as in the text.
Review for NeurIPS paper: Throughput-Optimal Topology Design for Cross-Silo Federated Learning
The paper proposes methods for designing communication graph for the decentralized periodic averaging SGD (DPASGD) in the federated learning set up focusing on reducing the per-iteration complexity (cycle time). The reviews were very appreciative of the good system and experimental design aspects of the paper that accounts for various types of delays in realistic scenarios. I would like to thank the authors for their effort. The reviewers were quite engaged and have provided many useful feedback and I hope these will be used to improve the paper. In particular, I would like to comment of few points -- please see full reviews for details - Although the authors motivate the need for focusing on cycle time over convergence rate in the introduction, based on the reviews, I believe it would be useful to include this discussion explicitly as a highlighted paragraph or subsection (see also comments by R2 on digraph constraint) - I would also encourage you to consider the title change suggestion by R2 (or something similar) as I and other reviewers agree that the current title is too generic.
Throughput-Optimal Topology Design for Cross-Silo Federated Learning
Federated learning usually employs a client-server architecture where an orchestrator iteratively aggregates model updates from remote clients and pushes them back a refined model. This approach may be inefficient in cross-silo settings, as close-by data silos with high-speed access links may exchange information faster than with the orchestrator, and the orchestrator may become a communication bottleneck. In this paper we define the problem of topology design for cross-silo federated learning using the theory of max-plus linear systems to compute the system throughput---number of communication rounds per time unit. We also propose practical algorithms that, under the knowledge of measurable network characteristics, find a topology with the largest throughput or with provable throughput guarantees. Speedups are even larger with slower access links.